Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introspection #58

Merged
merged 24 commits into from
May 7, 2021
Merged

Introspection #58

merged 24 commits into from
May 7, 2021

Conversation

ctfhacker
Copy link
Collaborator

@ctfhacker ctfhacker commented Apr 18, 2021

This PR adds the perf_stats feature to libafl which would enable performance metrics to each fuzzing instance. This should mostly be used for debugging to further understand what percentage of time is being spent where in the fuzzing pipeline.

Default metrics:

# fuzzers/libfuzzer_libpng/Cargo.toml
libafl = { path = "../../libafl/", features = ["default"] }
[Stats #1] clients: 5, corpus: 950, objectives: 0, executions: 181549, exec/sec: 0
[Stats #1] clients: 5, corpus: 1104, objectives: 0, executions: 361607, exec/sec: 361607
[Stats #1] clients: 5, corpus: 1218, objectives: 0, executions: 529569, exec/sec: 264784
[Stats #1] clients: 5, corpus: 1293, objectives: 0, executions: 698878, exec/sec: 232958

With performance metrics:

# fuzzers/libfuzzer_libpng/Cargo.toml
libafl = { path = "../../libafl/", features = ["default", "perf_stats"] }
[PerfStats #1] clients: 5, corpus: 1425, objectives: 0, executions: 1699447, exec/sec: 205088
Client 001: Scheduler: 0.15 | Manager: 0.00 | Stages:
  Stage 0:
    0.0030: GetInputFromCorpus
    0.0713: Mutate
    0.0004: MutatePostExec
    0.1333: TargetExecution
    0.0113: PreExec
    0.0082: PostExec
    0.0048: PreExecObservers
    0.1051: PostExecObservers
    0.4978: GetFeedbackInterestingAll
    0.0004: GetObjectivesInterestingAll
  Not Measured: 0.01
Client 002: Scheduler: 0.14 | Manager: 0.00 | Stages:
  Stage 0:
    0.0031: GetInputFromCorpus
    0.0727: Mutate
    0.0004: MutatePostExec
    0.1321: TargetExecution
    0.0113: PreExec
    0.0081: PostExec
    0.0050: PreExecObservers
    0.1073: PostExecObservers
    0.5086: GetFeedbackInterestingAll
    0.0004: GetObjectivesInterestingAll
  Not Measured: 0.01

The metrics are stored in the ClientPerfStats struct which is added to the ClientStats struct only when the perf_stats feature is enabled. The timing is performed using rdtsc on x86 (no other archs have been implemented thus far), but is built using cpu::read_time_counter which is also only built for x86. Other archs will have a compile time error that this function has not been created.

The bulk of the new code is in State::execute_input where timers are added (only with perf_stats feature) around each phase of the fuzzing of an input. In this way, researchers can understand the performance ramifications of the components under experiment.

The tests were done in the libfuzzer_libpng example.

Performance with performance monitoring

The follow experiment was ran with the libfuzzer_libpng example fuzzer.

These were executed on the following CPU:

Intel(R) Core(TM) i5-6440HQ CPU @ 2.60GHz

The stats here are a bit weird in the sense that with perf_stats, there seems to be more exec/sec. I haven't quite figured out why yet, but was hoping someone else could verify on their system as well.

Test without perf_stats with 1 fuzzer

timeout 30s ./fuzzer &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
[Stats #1] clients: 2, corpus: 351, objectives: 0, executions: 1257056, exec/sec: 47069

Test with perf_stats with 1 fuzzer

timeout 30s ./fuzzer &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
[PerfStats #1] clients: 2, corpus: 391, objectives: 0, executions: 1447825, exec/sec: 64405
Client 001: Scheduler: 0.13 | Manager: 0.00 | Stages:
  Stage 0:
    0.0040: GetInputFromCorpus
    0.0965: Mutate
    0.0005: MutatePostExec
    0.1782: TargetExecution
    0.0310: PreExec
    0.0261: PostExec
    0.0074: PreExecObservers
    0.1089: PostExecObservers
    0.0005: GetObjectivesInterestingAll
    0.4103: Feedback index 0
    0.0005: Feedback index 1
  Not Measured: 0.01

Test without perf_stats with 3 fuzzers

timeout 30s ./fuzzer &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
[Stats #1] clients: 4, corpus: 1293, objectives: 0, executions: 3291898, exec/sec: 137719

Test with perf_stats with 3 fuzzers

timeout 30s ./fuzzer &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
timeout 29s ./fuzzer >/dev/null 2>/dev/null &
[PerfStats #1] clients: 4, corpus: 1205, objectives: 0, executions: 3863856, exec/sec: 153861
Client 001: Scheduler: 0.18 | Manager: 0.00 | Stages:
  Stage 0:
    0.0035: GetInputFromCorpus
    0.0862: Mutate
    0.0004: MutatePostExec
    0.1977: TargetExecution
    0.0274: PreExec
    0.0235: PostExec
    0.0064: PreExecObservers
    0.0984: PostExecObservers
    0.0004: GetObjectivesInterestingAll
    0.3704: Feedback index 0
    0.0004: Feedback index 1
  Not Measured: 0.01
Client 002: Scheduler: 0.16 | Manager: 0.00 | Stages:
  Stage 0:
    0.0035: GetInputFromCorpus
    0.0881: Mutate
    0.0004: MutatePostExec
    0.1980: TargetExecution
    0.0280: PreExec
    0.0242: PostExec
    0.0071: PreExecObservers
    0.1009: PostExecObservers
    0.0004: GetObjectivesInterestingAll
    0.3800: Feedback index 0
    0.0004: Feedback index 1
  Not Measured: 0.01
Client 003: Scheduler: 0.14 | Manager: 0.00 | Stages:
  Stage 0:
    0.0036: GetInputFromCorpus
    0.0901: Mutate
    0.0004: MutatePostExec
    0.2008: TargetExecution
    0.0288: PreExec
    0.0247: PostExec
    0.0067: PreExecObservers
    0.1028: PostExecObservers
    0.0004: GetObjectivesInterestingAll
    0.3888: Feedback index 0
    0.0004: Feedback index 1
  Not Measured: 0.01

@domenukk
Copy link
Member

Seems like all it's missing is a good old cargo fmt

Copy link
Member

@domenukk domenukk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff

fuzzers/libfuzzer_libpng/src/lib.rs Outdated Show resolved Hide resolved
libafl/src/fuzzer.rs Outdated Show resolved Hide resolved
const NUM_STAGES: usize = 4;

/// Number of feedback mechanisms to measure for performance
pub const NUM_FEEDBACKS: usize = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be generated at builtime somehow? Since feedback tuples are generated at build time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. yeah i'll start looking into how this could be re-worked to be known at compile time specific to the target being compiled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I think this would clog up the code a bit, since i /think/ we would have to have something like:

pub struct ClientPerfStats<const S: usize, const F: usize> {
 ...
 stages_used: [bool; NUM_STAGES]
 ... ect
 }

which would mean then the ClientPerfs also needs it:

struct ClientPerfs<const S: usize, const F: usize> {
 ....
 perf_stats: ClientPerfStats<S, F>
 }

which would then mean all instances of ClientStats would have these across the entire code base.

I /think/ we could have a compile time assert that checks on State creation if the length of the stages or feedback is greater than the number of alloted spaces in the perf stats and maybe print an error if that is the case. Not quite as automatic, though and would require the researcher change the variable if the error triggered.

Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more about this.. I think we would need to have some const fn len() on FeedbackTuple and StagesTuple. I don't think const fn can be used in a trait from rust-lang/rfcs#2632. So I think the only option here is to add a run-time check that fails if too many stages or feedbacks are added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now if you sync from dev HasLen has a const LEN field, you can just do ST::LEN for the cost len of the StagesTuple for instance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW Just tried it, but ST::LEN doesn't work yet everywhere. It's not available at some places and we would need nightly features at others. We could use Vecs instead of arrays, though. Not sure if/how much this would impact perf.
Or just leave it as-is...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arrays are there to ensure we don't do any allocations. I don't mind either way really. We could always just do a vec::with_capacity I guess


/// Update the total stage counter and increment the stage counter for the next stage
pub fn finish_stage(&mut self) {
// Increment the stage to the next index. The check is only done if this were to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These short functions may benefit from [inline] (maybe not necessary with LTO builds?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add a few just in case for the small ones

@s1341
Copy link
Collaborator

s1341 commented Apr 19, 2021

Awseome! What is necessary to make this aarch64 compatible?

/// the fuzzer.
#[cfg(target_arch = "x86_64")]
pub fn read_time_counter() -> u64 {
unsafe { core::arch::x86_64::_rdtsc() }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth using https://docs.rs/llvmint/0.0.3/llvmint/fn.readcyclecounter.html, as that will automatically do the right thing for each arch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds really useful for more obscure platforms!
We should only add the dependency for this feature flag though (and maybe even only for the platforms we can not reach otherwise?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sure! Didn't know about the llvm intrinsic. If we're okay with the llvmint dependency then I'm fine with using it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original thought was to add those specific instructions similar to Google's benchmark: https://github.com/google/benchmark/blob/master/src/cycleclock.h

I think that would mitigate the external dependency just for one instruction per arch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that would require asm... which is not stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's go with

  • libc, if available for this arch
  • llvmint dependency otherwise
    ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or since it's a feature flag anyway, just llvmint is fine too, if it's too much work otherwise

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a fix using llvmint::readcyclecounter. Looks like the feature needed requires nightly. I think you guys are wanting to stay on stable as far as I'm aware, which would mean neither this or the asm! fix would work. I guess we could also require nightly for the perf_stats feature as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit from the last comment.. i had the wrong cfg_attr. I've ripped out only the needed extern from llvmint so we don't need to add that dependency directly to src/cpu.rs and it seems to work fine now.

@@ -3,6 +3,7 @@ Welcome to libAFL
*/

#![cfg_attr(not(feature = "std"), no_std)]
#![cfg_attr(feature = "perf_stats", feature(link_llvm_intrinsics))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ins't this feature perma-unstable? Might be worth depending on llvmint instead of requiring nightly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too. Seems like the stable and hardware-independent way to go for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to use it in libafl_frida anyways, if that helps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof... llvmint depends on simdty which uses nightly features...
Not good.
We can either do our own C wrapper or go back to the initial proposal to do it using libc...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so we do want to enforce std features. I can convert everything to just using Instant rather than the pure u64. It won't work for no_std obviously. I guess I forgot not everyone doesn't run on nightly all the time ;-)

I guess asking to run in nightly for perf_stats might be a big ask as well?

Yeah llvmint relying on simdty was the reason I ripped out just the extern in the first place.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the C wrapper is the way to go...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C wrappers are the way to go but be careful, Rust code is compiled with LLVM but the C code may be compiled with GCC or MSVC, depends on the build.rs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I discovered the hard way that __builtin_thread_pointer is not universally accepted.

@domenukk
Copy link
Member

@ctfhacker how about renaming this feature to introspection? Eventually, my goal is to add in metadata, like "mutation x was successful y times, and took z seconds in total"

@ctfhacker
Copy link
Collaborator Author

Not a problem on my end. Note, the current setup uses clock cycles and not seconds, so we'd have to get the clock speed to covert to seconds. Or just going straight away to Duration if that is a use case we already know about.

@domenukk
Copy link
Member

Seconds was just an example. Clock cycles is fine, as long as it's measured in the same way and is fast doing so :)

@domenukk domenukk changed the title Real time benchmarks Introspection Apr 29, 2021
@domenukk
Copy link
Member

domenukk commented May 1, 2021

I've renamed it to introspection and moved this back to the x86 only flag. If anyone needs on non-x86, we'll need to link c to it. Any takers?

* initial b2b implementation

* no_std and clippy fixes

* b2b testcase added

* more correct testcases

* fixed b2b

* typo

* fixed unused warning
@andreafioraldi andreafioraldi changed the base branch from old_dev to dev May 6, 2021 18:19
@domenukk
Copy link
Member

domenukk commented May 7, 2021

I've done some very very rudimentary measurements and, while using rdtsc was consistently faster than using the system time, it's not too far off (1 mio execs in 10 seconds on 4 slow, overheated cores) vs 850k using systemtime.
So I've added this as a fallback, as at least the actual measurement seemed to be in the same ballpark.
We can eventually add other arch support using C code, but for now I would like to merge this, I think.

@andreafioraldi
Copy link
Member

andreafioraldi commented May 7, 2021

I will try to remove the hardcoded NUM_FEEDBACKS if I can and then merge

@andreafioraldi
Copy link
Member

ok needs a rework to remove thes consts. Btw, NUM_FEEDBACKS = 4 is too few IMO, should be 16

@andreafioraldi andreafioraldi changed the base branch from dev to main May 7, 2021 07:59
@andreafioraldi andreafioraldi merged commit 94fa04f into main May 7, 2021
@andreafioraldi andreafioraldi mentioned this pull request May 7, 2021
@domenukk domenukk deleted the real_time_benchmarks branch May 19, 2021 13:34
@@ -230,7 +230,7 @@ where
let client = stats.client_stats_mut_for(sender_id);
client.update_corpus_size(*corpus_size as u64);
client.update_executions(*executions as u64, *time);
stats.display(event.name().to_string() + " #" + &sender_id.to_string());
// stats.display(event.name().to_string() + " #" + &sender_id.to_string());
Copy link
Member

@andreafioraldi andreafioraldi May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domenukk porcodio, now we have to release again because this line was commented

@@ -241,7 +241,35 @@ where
// TODO: The stats buffer should be added on client add.
let client = stats.client_stats_mut_for(sender_id);
client.update_executions(*executions as u64, *time);
stats.display(event.name().to_string() + " #" + &sender_id.to_string());
if sender_id == 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be needed, I agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants